Statistical analysis on testing of an entangled state 
based on Poisson distribution framework 
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A hypothesis testing scheme for entanglement has been formulated based on the Poisson dis- 
tribution framework instead of the POVM framework. Three designs were proposed to test the 
entangled states in this framework. The designs were evaluated in terms of the asymptotic variance. 
It has been shown that the optimal time allocation between the coincidence and anti-coincidence 
measurement bases improves the conventional testing method. The test can be further improved by 
optimizing the time allocation between the anti-coincidence bases. 
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I. INTRODUCTION 

Entangled states are an essential resource for various 
quantum information processings Q, 0- Hence, it is re- 
quired to generate maximally entangled states. How- 
ever, for a practical use, it is more essential to guarantee 
the quality of generated entangled states. Statistical hy- 
pothesis testing is a standard method for guaranteeing 
the quality of industrial products. Therefore, it is much 
needed to establish the method for statistical testing of 
maximally entangled states. 

Quantum state estimation and quantum state tomog- 
raphy are known as the method of identifying the un- 
known state U 0, 0. Quantum state tomography [|| has 
been recently applied to obtain full information of the 
4x4 density matrix. However, if the purpose is testing 
of entanglement, it is more economical to concentrate on 
checking the degree of entanglement. Such a study has 
been done by Tsuda et al || as optimization problems 
of POVM. However, an implemented quantum measure- 
ment cannot be regarded as an application of a POVM 
to a single particle system or a multiple application of a 
POVM to single particle systems. In particular, in quan- 
tum optics, the following measurement is often realized, 
which is not described by a POVM on a single particle 
system. The number of generated particles is probabilis- 
tic. We prepare a filter corresponding to a projection P, 
and detect the number of particle passing through the fil- 
ter. If the number of generated particles obeys a Poisson 
distribution, as is mentioned in Section [H] the number 
of detected particles obeys another Poisson distribution 
whose average is given by the density and the projection 
P. 

In this kind of measurements, if any particle is not 
detected, we cannot decide whether a particle is not gen- 
erated or it is generated but does not pass through the 
filter. If we can detect the number of generated particles 



as well as the number of passing particles, the measure- 
ment can be regarded as the multiple application of the 
POVM {P,I — P}. In this case, the number of applica- 
tions of the POVM is the variable corresponding to the 
number of generated particles. Also, we only can detect 
the empirical distribution. Hence, our obtained informa- 
tion almost discuss by use of the POVM {P, I — P}. 

However, if it is impossible to distinguish the two 
events by some imperfections, it is impossible to reduce 
the analysis of our obtained information to the analysis 
of POVMs. Hence, it is needed to analyze the perfor- 
mance of the estimation and/or the hypothesis testing 
based on the Poisson distribution describing the number 
of detected particles. If we discuss the ultimate bound 
of the accuracy of the estimation and/or the hypothesis 
testing, we do not have to treat such imperfect measure- 
ments. Since several realistic measurements have such 
imperfections, it is very important to optimize our mea- 
surement among such a class of imperfect measurements. 

In this paper, our measurement is restricted to the de- 
tection of the number of the particle passing through the 
filter corresponding to a projection P. We apply this for- 
mulation to the testing of maximally entangled states on 
two qubit systems (two-level systems), each of which is 
spanned by two vectors \H) and \V). Since the target 
system is a bipartite system, it is natural to restrict to 
our measurement to local operations and classical com- 
munications (LOCC). In this paper, for a simple real- 
ization, we restrict our measurements to the number of 
the simultaneous detections at the both parties of the 
particles passing through the respective filters. We also 
restrict the total measurement time t, and optimize the 
allocation of the time for each filters at the both parties. 

As our results, we obtain the following characteriza- 
tions. If the average number of the generated parti- 
cles is known, our choice is counting the coincidence 
events or the anti-coincidence events. When the true 
state is close to the target maximally entangled state 
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|$(+)) : = -k=(\HH) + \VV)) (that is, the fidelity be- 
tween these is greater than 1/4), the detection of anti- 
coincidence events is better than that of coincidence 
events. This result implies that the indistinguishability 
between the coincidence events and the non-generation 
event loses less information than that between the anti- 
coincidence events and the non-generation event. 

This fact also holds even if we treat this prob- 
lem taking into account the effect of dark counts. 
In this discussion, in order to remove the bias 
concerning the direction of the difference, we as- 
sume the equal time allocation among the vectors 
{\HV),\VH),\DX},\XD),\RR),\LL}}, which corre- 
sponds to the anti-coincidence events, and that among 
the vectors {\HH),\VV),\DD),\XX),\RL),\LR)}, 
which corresponds to the coincidence events, where 
\D) := l(\H) + \V)), \X) := \{\H) - \V)), 

\R) := \{\H)± i\V)), \L) := \{\H) - i\V)). Indeed, 
Barbieri et al Q proposed to detect the anti-coincidence 
events for measuring an entanglement witness, they 
did not prove the superiority of detecting the anti- 
coincidence events in the framework of mathematical 
statistics. 

However, the average number of the generated par- 
ticles is usually unknown. In this case, we cannot 
estimate how close the true state is to the target 
maximally entangled state from the detection of anti- 
coincidence events. Hence, we need to count the co- 
incidence events as additional information, in order 
to resolve this problem, we usually use the equal allo- 
cation between anti-coincidence events and coincidence 
events in the visibility method, which is a conven- 
tional method for checking the entanglement. However, 
since we measure the coincidence events and the anti- 
coincidence events based on one or two bases in this 
method, there is a bias concerning the direction of the 
difference. In order to remove this bias, we consider the 
detecting method with the equal time allocation among 
all vectors {\HV),\VH),\DX),\XD),\RR),\LL)} and 
{\HH),\VV),\DD),\XX),\RL),\LR)}, and call it the 
modified visibility method. 

In this paper, we also examine the detection of the 
total flux, which can be realized by detecting the parti- 
cle without the filter. We optimize the time allocation 
among these three detections. We found that the opti- 
mal time allocation depends on the fidelity between the 
true state and the target maximally entangled state. If 
our purpose is estimating the fidelity F, we cannot di- 
rectly apply the optimal time allocation. However, the 
purpose is testing whether the fidelity F is greater than 
the given threshold Fo, the optimal allocation at Fq gives 
the optimal testing method. 

If the fidelity F is less than a critical value, the optimal 
allocation is given by the allocation between the anti- 
coincidence vectors and the coincidence vectors (the ratio 
depends on F.) Otherwise, it is given by the allocation 
only between the anti-coincidence vectors and the total 
flux. This fact is valid even if the dark count exists. If the 



dark count is greater than a certain value, the optimal 
time allocation is always given by the allocation between 
the anti-coincidence vectors and the coincidence vectors. 

Further, we consider the optimal allocation among 
anti-coincidence vectors when the average number of gen- 
erated particles. The optimal allocation depends on the 
direction of the difference between the true state and the 
target state. Since the direction is usually unknown, this 
optimal allocation dose not seems useful. However, by 
adaptively deciding the optimal time allocation, we can 
apply the optimal time allocation. We propose to apply 
this optimal allocation by use of the two-stage method. 
Further, taking into account the complexity of testing 
methods and the dark counts, we give a testing procedure 
of entanglement based on the two-stage method. In addi- 
tion, proposed designs of experiments were demonstrated 
by Hayashi et al. |14| in two photon pairs generated by 
spontaneous parametric down conversion (SPDC). 

In this article, we reformulate the hypothesis testing 
to be applicable to the Poisson distribution framework, 
and demonstrate the effectiveness of the optimized time 
allocation in the entanglement test. The construction of 
this article is following. Section [H] defines the Poisson 
distribution framework and gives the hypothesis scheme 
for the entanglement. Section ITTT1 gives the mathemati- 
cal formulation concerning statistical hypothesis testing. 
Sections IIVI and [V] give the fundamental properties of 
the hypothesis testing: section IIVI introduces the likeli- 
hood ratio test and its modification, and section Ivl gives 
the asymptotic theory of the hypothesis testing. Sections 
IVIIIXI are devoted to the designs of the time allocation 
between the coincidence and anti-coincidence bases: sec- 
tion IVII defines the modified visibility method, section 
IVIII optimize the time allocation, when the total photon 
flux A is unknown, section IVIIII gives the results with 
known A, and section ITXl compares the designs in terms 
of the asymptotic variance. Section 1X1 gives further im- 
provement by optimizing the time allocation between the 
anti-coincidence bases. Appendices give the detail of the 
proofs used in the optimization. 



II. HYPOTHESIS TESTING SCHEME FOR 
ENTANGLEMENT IN POISSON DISTRIBUTION 
FRAMEWORK 

Let H be the Hilbcrt space of our interest, and P 
be the projection corresponding to our filter. If we as- 
sume generation process on each time to be identical 
but individual, the total number n of generated par- 
ticles during the time t obeys the Poisson distribution 
Poi(At)(n) :— e~ xt ^ xt ] . Hence, when the density of the 
true state is a, the probability of the number k of de- 
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tected particles is given as 

V Poi(At)(n) ( n ) (Tr Pa) k {l - Tr Pa) n - k 
=¥oi{\tTrPa){k). (1) 

Poi(^) P 




generator filter detector 



FIG. 1: Experimental scheme in Poisson distribution frame- 
work 

In fact, if we treat the Fock space generated by H 
instead of the single particle system H., this measurement 
can be described by a POVM. However, since this POVM 
dooes not have a simple form, it is suitable to treat this 
measurement in the form (|T|). 

Further, if we errorly detect the k' particles with the 
probability Poi(<5i)(fc'), the probability of the number k 
of detected particles is equal to 

k 

Poi(A£ Tr Pa)(k - k') + Poi(<rf)(V) 

fc'=0 

=Poi((ATr Pa + 5)t)(k). 

This kind of incorrect detection is called dark count. Fur- 
ther, since we consider the bipartite case, i.e., the case 
where Ti = C 2 (8 C 2 , we assume that our projection P 
has the separable form Pi (g> P2 . 

In this paper, under the above assumption, we discuss 
the hypothesis testing when the target state is the max- 
imally entangled |$( + ') state while Usami et al.^lj dis- 
cussed the state estimation under this assumption. Here 
we measure the degree of entanglement by the fidelity 
between the generated state and the target state: 

F= ($ (+) |(t|* (+) >- (2) 

The purpose of the test is to guarantee that the state is 
sufficiently close to the maximally entangled state with a 
certain significance. That is, we are required to disprove 
that the fidelity F is less than a threshold Fq with a small 
error probability. In mathematical statistics, this situa- 
tion is formulated as hypothesis testing; we introduce the 
null hypothesis H$ that entanglement is not enough and 
the alternative Hi that the entanglement is enough: 

H„:F<F v.s. Hi : F > F , (3) 

with a threshold Fq. 



Visibility is an indicator of entanglement commonly 
used in the experiments, and is calculated as follows: 
first, A's measurement vector \xa) is fixed, then the mea- 
surement \xa-i Vb) is performed by rotating B's measure- 
ment vector \xb) to obtain the maximum and minimum 
number of the counts, n max and n m i n . We need to make 
the measurement with at least two bases of A in order 
to exclude the possibility of the classical correlation. We 
may choose the two bases {\H), \V)} and {\D), \X)} as 
\xa), for example. Finally, the visibility is given by the 
ratio between n max - n min and n max + n min with the 
respective A's measurement basis \xa}- However, our de- 
cision will contain a bias, if we choose only two bases as 
A's measurement basis \xa)- Hence, we cannot estimate 
the fidelity between the target maximally entangled state 
and the given state in a statistically proper way from the 
visibility. 

Since the equation 

\HH)(HH\ + \VV)(VV\ + \DD)(DD\ 
+ \XX)(XX\ + \RL)(RL\ + \LR)(LR\ 
=2|$(+))($(+)| +/ (4) 

holds, we can estimate the fidelity by measuring 
the sum of the counts of the following vectors: 
\HH),WV), \DD), \XX), \RL), and \LR), when A is 
known0,Q. This is because the sum n\ := uhh +nvv + 
it-dd + nxx + tirl + Ulr obeys the Poisson distribution 
with the expectation value (A 1+ 6 2 F +S)ti, where the mea- 
surement time for each vector is 4r . We call these vectors 
the coincidence vectors because these correspond to the 
coincidence events. 

However, since the parameter A is usually unknown, 
we need to perform another measurement on different 
vectors to obtain additional information. Since 

\HV)(HV\ + \VH)(VH\ + \XD)(XD\ 
+ \DX)(DX\ + \RR)(RR\ + \LL)(LL\ 
=2/-2|$W)($(+)| (5) 

also holds, we can estimate the fidelity by measur- 
ing the sum of the counts of the following vectors: 
\HV),\VH),\DX},\XD),\RR), and \LL). The sum 
n 2 := n HV + n VH + n DX + n X D + n RR + n LL obeys 
the Poisson distribution Poi((A 2 ~ 6 2F + 8)^2), where the 
measurement time for each vector is Combining the 
two measurements, we can estimate the fidelity with- 
out the knowledge of A. We call these vectors the anti- 
coincidence vectors because these correspond to the anti- 
coincidence events. 

We can also consider different type of measurement 
on A. If we prepare our device to detect all photons, 
i.e., the case where the projection is I ® /, the detected 
number 713 obeys the distribution Poi((A + i5)t3) with the 
measurement time £3. We will refer to it as the total 
flux measurement. In the following, we consider the best 
time allocation for estimation and test on the fidelity, 
by applying methods of mathematical statistics. We will 
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assume that A is known or estimated from the detected 
number 713. 



III. HYPOTHESIS TESTING FOR 
PROBABILITY DISTRIBUTIONS 

A. Formulation 

In this section, we review the fundamental knowl- 
edge of hypothesis testing for probability distributions yj. 
Suppose that a random variable X is distributed accord- 
ing to a probability measure Pg identified by the unknown 
parameter 9. We also assume that the unknown parame- 
ter 9 belongs to one of mutually disjoint sets Oo and 0j. 
When we want to guarantee that the true parameter 9 be- 
longs to the set 81 with a certain significance, we choose 
the null hypothesis Hq and the alternative hypothesis Hi 
as 



H Q : 9 E 0o versus H x : 9 S ©i. 



(6) 



Then, our decision method is described by a test, which 
is described as a function (f>(x) taking values in {0, 1}; Hq 
is rejected if 1 is observed, and Hq is not rejected if is 
observed. That is, we make our decision only when 1 is 
observed, and do not otherwise. This is because the pur- 
pose is accepting Hi by rejecting Hq with guaranteeing 
the quality of our decision, and is not rejecting Hi nor 
accepting H\. Therefore, we call the region {x|0(l) = 1} 
the rejection region. The test <fi can be defined by the 
rejection region. In fact, we choosed the hypothesis that 
the fidelity is less than the given threshold #0 as the null 
hypothesis Hq in Section [H] This formulation is natural 
because our purpose is guaranteeing that the fidelity is 
not less than the given threshold 9q. 

From theoretical viewpoint, we often consider random- 
ized tests, in which we probabilistically make the decision 
for a given data. Such a test is given by a function <j> map- 
ping to the interval [0, 1]. When we observe the data x, 
Hq is rejected with the probability <p(x). In the following, 
we treat randomized tests as well as deterministic tests. 

In the statistical hypothesis testing, we minimize error 
probabilities of the test 4>. There are two types of errors. 
The type one error is the case where Hq is rejected though 
it is true. The type two error is the converse case, Hq 
is accepted though it is false. Hence, the type one error 
probability is given Pe(4>) (9 E Oo), an d the type two 
error probability is given 1 — Pg'{<f)) (6' E 61), where 

P e {<t>) = [ cj>(x)dP e (x). 



It is in general impossible to minimize both Pg(<p) and 
1 — Pg> (4>) simultaneously because of a trade-off relation 
between them. Since we make our decision with guaran- 
teeing its quality only when 1 is observed, it is definitively 
required that the type one error probability Pg (<fi) is less 
than a certain constant a. For this reason, we minimize 



the type two error probability 1 — Pg'((f>) under the condi- 
tion Pg{4>) < a. The constant a in the condition is called 
the risk probability, which guarantees the quality of our 
decision. If the risk probability is large enough, our deci- 
sion has less reliability. Under this constraint for the risk 
probability, we maximize the probability to reject the hy- 
pothesis Hq when the true parameter is 9' E 0%. This 
probability is given as Pg(4>), and is called the power of 
(f>. Hence, a test <f> of the risk probability a is said to be 
most powerful (MP) at 9' E &i if Pg>(<p) > Pg>(^) holds 
for any test ip of the risk probability a. Then, a test is 
said to be Uniformly Most Powerful (UMP) if it is MP 
at any 9' E ®\. 



B. p- values 

In the hypothesis testing, we usually fixed our test be- 
fore applying it to data. However, we sometimes focus 
on the minimum risk probability among tests in a class T 
rejecting the hypothesis Hq with a given data. This value 
is called the p- value, which depends on the observed data 
x as well as the subset Go to be rejected. 

In fact, in order to define the p-value, we have to fix a 
class T of tests. Then, for x and Oo, p-value is defined 
as 



min max Pg ((f)). 
b£T-.4>(x)=i eee 



(7) 



Since the p-value expresses the risk for rejecting the hy- 
pothesis iJo, Hence, this concept is useful for comparison 
among several designs of experiment. 

Note that if we are allowed to choose any function cj> 
as a test, the above minimum is attained by the function 



if y 7^ x 

1 if y = x. 



(8) 



In this case, the p-vale is max0 6 e o Pe(x). However, the 
function 8 X is unnatural as a test. Hence, we should fix 
a class of tests to define p-value. 



IV. LIKELIHOOD TEST 

A. Definition 

In mathematical statistics, the likelihood ratio tests is 
often used as a class of standard tests 9] . This kind of 
tests often provide the UMP test in some typical cases. 
When both 9 and Oi consist of single elements as 60 = 
{#0} and 81 = {9i}, the likelihood ratio test </>LR,r is 
defined as 



(f>LR,r(x) 



if Pg {x)/P 9l (x) > r, 
if Pg (x)/P ei (x) < r 
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where r is a constant, and the ratio Pg (x) / Pg 1 (x) is 
called the likelihood ratio. From the definition, any test 
4> satisfies 

(rP 6l - Pe D )(^ L R,r) > (rP Sl - P 0o )(<j>). (9) 
When a likelihood ratio test (f>LR, r satisfies 

a = P 9o (0LR,r), (10) 

the test <phR, r is MP of level a. Indeed, when a test <j> 
satisfies Pg Q {4>) < a, 

- a + rP 9l (0) = ~P 0O (0) + rP 9l (0) 

< - Pe (0LR,r) + rP 9l (0LR,r) = -a + rP 6l (</>LR,r)- 

Hence, 1 — Pe 1 {(/)) > 1 — -P^i (^LR,r)- This is known as 
Neyman- Pearson's fundamental lemma [l5| . 

The likelihood ratio test is generalized to the cases 
where 0q or 9i has at least two elements as 



4>LR,r(x) 



q ^ su Poge P e (x) 
su Peee 1 p e(x) 
^ .p su Poge p e( x ) 



>r, 
< r. 



Usually, in order to guarantee a small risk probability, 
the likelihood ratio r is choosed as r < 1. 



B. Monotone Likelihood Ratio Test 

In cases where the hypothesis is one-sided, that is, the 
parameter space is an interval of R and the hypothesis 
is given as 



H : 9 > 9 versus Hi : 9 < 



(11) 



we often use so-called interval tests for its optimality un- 
der some conditions as well as for its naturalness. 

When the likelihood ratio Pg(x)/ P v (x) is monotone 
increasing concerning x for any 9, r\ such that 6 > 77, 
the likelihood ratio is called monotone. In this case, the 
likelihood ratio test </>LR,r between Pg and Pg 1 is UMP 
of level a := Pg (4>hR,r), where 9\ is an arbitrary element 
satisfying 9\ < 9 - 

Indeed, many important examples satisfy this condi- 
tion. Hence, it is convenient to give its proof here. 

From the monotonicity, the likelihood ratio test 0LR,r 
has the form 



4>hR,r(x) = 



1 X < Xq 

x > xq 



(12) 



with a threshold value xq. Since the monotonicity implies 
Pe (<f>hR,r) > Pe(4>LR,r) for any 9 g O , it follows from 
Neyman Pearson Lemma that the likelihood ratio test 
4>LR,r is MP of level a. From 1|12[1 . the likelihood ratio 
test 4>LR.r is also a likelihood ratio test between Pg and 
Prj, where rj is another element satisfying r\ < 9$. Hence, 
the test </>LR.r is also MP of level a. 



From the above discussion, it is suitable to treat p- 
value based on the class of likelihood ratio tests. In this 
case, when we observe xq, the p- value is equal to 



Pg (dx). (13) 



C. One-Parameter Exponential Family 

In mathematical statistics, exponential families are 
known as a class of typical statistical models^^- A fam- 
ily of probability distributions {Pg\9 C 6} is called an 
exponential family when there exists a random variable 
x such that 



Pg(x) :=P (x)exp(9x + g(9)), 



(14) 



where g{9) :— — log J exp(9x)Po(dx). 

It is known that this class of families includes, for ex- 
ample, the Poisson distributions, normal distributions, 
binomial distributions, etc. In this case, the likelihood 
^tio ggagg} = exp((0 o ~ 9i)x + g(0 ) - g(h)) is 
monotone concerning x for 9q > Q\. Hence, the likelihood 
ratio test is UMP in the hypothesis Note that this 

argument is valid even if we choose a different parameter 
if the family has a parameter satisfying l|14l) . 

For example, in the case of the normal distribution 

Pg(x)~ * ' J " 

test 



2V 



, V2wV 

iuMP,Q of the level a is given as 



the UMP 



1 if x < O - e a VV 
if x > 6 - e a W, 



(15) 



where 



$(-e a ) = a, $(e) 



/2tt 



2 dx. 



The n-trial binomial distributions Pp[k) — (^)(1 — 
p) n ~ k p k are also an exponential family because an- 
other parameter 9 := log satisfies that Pp{k) — 

Hence, in the case of the n-trial bi- 



(n\ J_ 6 k+n log yr - 
\k) 2" e 



nomial distribution, the UMP test 



'UMP.a 



of the level a 



is given as the randomized likelihood ratio test: 



<^UMP,a(fc) 



1 if x < ko 
7 if x = ko 
if x > kg 



(16) 



where ko is the maximum value k' satisfying a > 

(2) 

feo-l 



Efci 1 0(1 - 0) n - k 9 k , and 7 is defined as 



■»=7(£)<i-er-^ + £(;)< 



1 



\n—knk 



(17) 
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Therefore, when k is observed, the p- value is 

Et=o (£)(l-0o)- fc 'C 

When n is sufficiently large, the distribution Pg(k) 

can be approximated by the normal distribution with 

variance n(l — 9)9. Hence, the UMP test 0xjmp q of the 

level a is approximately given as 



^JMP,o(^) — 



1 if£<0 o -e Q 

The p-value is also approximated to 

<6( k - n9 ° _ ) 
V«(l-»o)0o 



flo(l-8 ) 
n 

flo(l-8 ) 



(18) 



(19) 



The Poisson distributions Poi(^) are also an exponen- 
tial family because another parameter 9 := log/x satisfies 
Poi(/x)(n) = \ l e 9n - e " . The UMP test UM p,a of the level 
a is characterized similarly to (|16fl . When the threshold 
//q is sufficiently large and the hypothesis is given 



Hq : fj, > versus Hi : fx < [Xq, 



(20) 



the UMP test 4>\JMP,a of the level a is approximately 
given as 

± r„\ _ / 1 if « < /x - e«VMo r91 s 



The p-value is also approximated to 



(22) 



Next, we consider testing the following hypothesis in 
the case of the binomial Poisson distribution Poi(/ii, M2) : 



H : 



Mi 



Mi + M2 



> 0q versus Hi 



Mi 



Mi + M2 



< O . 



(23) 



In this case, as is shown at (|32|) and (|31l) in Section liVDI 
the likelihood ratio test 4>hR,r is characetrized by the like- 
lihood ratio test of the binomial distributions as 

</»LR,r(ni,n 2 )=^+ n2 (n 1 ). (24) 

Hence, it is suitable to employ the likelihood ra- 
tio test <t>LR,l=a(ni,n>2) = ^UMP,a("l) witn tne level 
a. This is because the conditional distribution 
— , P 01 ^ 1 '^ 2 ^" 1 '" 2 ) is equal to the binomial 

Efc/= p °i(/*i.l»2)(fc',7ii+7ia-fc') 

distribution p"i+" 2 (m). Therefore, when we observe 
ni, 77,2, the p-value of this class of likelihood ratio tests is 

equal to Efclo ( ni t n2 )0o(l - 9oT 1+n2 - k - 

When the total number tt-i + 77,2 is sufficiently large, 
the test </>lr.;=q of the level a is approximately given as 



t>LR,i=a{ni,n 2 ) 



1 if 
if 



"1+12 

n-i 



< 00 
> 9 n 



e (v 


-0o) 




-n 2 


'e (i- 


-9 ) 





(25) 



The p-value is also approximated to 

77l - ("1 + "-2)^0 



y/(n 1 +n 2 )(l- 9 )9 ' 



(26) 



D. Multi-parameter case 

In the one-parameter case, UMP tests can be often 
characterized by likelihood ratiotests. However, in the 
multi-parameter case, this type characterization is im- 
possible generally, and the UMP test does not always 
exist. In this case, we have to choose our test among 
non-UMP tests. One idea is choosing our test among 
likelihood ratio tests because likelihood ratio tests always 
exist and we can expect that these tests have good perfor- 
mances. Generally, it is not easy to give an explicit form 
of the likelihood ratio test. When the family is a multi- 
parameter exponential family, the likelihood ratio test 
has a simple form. A family of probability distributions 
{P s \9 = \e l ,...,e m ) S R" 1 } is called an m-parameter 
exponential family when there exists m-dimensional ran- 
dom variable x — (a?i, . . . , x m ) such that 

Pg{x) := P Q (x) exp(f5 ■ x + g{9)), 

where g(9) := — log/exp(# • x)Po(dx). However, this 
form is not sufficiently simple because its rejection re- 
gion is given by the a nonlinear constraint. Hence, a test 
with a simpler form is required. In the following, we dis- 
cuss the likelihood ratio test in the case of multi-nomial 
Poisson distribution. After this discussion, we propose 
an alternative test. 

In an m-parameter exponential family, the likelihood 
ratio test </>LR,r has the form 

if inf Sieei D(P i{g) \\PfJ 

lif ird ffieei D(P m \\P^) 

~ inf r ee D ( p o(S)W p e ) < lo §^ 

(27) 

where the divergence D(P 7 j\\Pg) is defined as 

D(Pry\\Pe) ■■= J log^jp^f) 

= (rf-9) f xP rJ (dx) + g(ff) - g(9), 



and 9(x) is defined by 

j x'P S(3) {dx')=x. (28) 
This is because the logarithm of the likelihood function 



7 



is calculated as 



log 



= sup juof log 

= sup mf (6 - 0i) ■ x + 90 o ) - g0 o ) 

= sup mf (0 O - ■ f x'P S(s) {dx') + g(9 ) - g(6 ) 
= ™p mf D{P S(S) \\P Si )~D{P S(S) \\P Si ) 
= M D{P S{3) \\P Si ) - inf D(P,- ||P - d ). 

In addition, 9(x) coincides with the MLE when x is ob- 
served. Hence, when 9 = 0o U Oi, the likelihood ratio 
test with the ratio r < 1 is given by the rejection region: 



Or) € 0i, >f D{P S \\P gi ■ ) > - logr } . (29) 



In the case of the multi-nomial Poisson distributions 
Poi(/J)(fc) := e~ Si=i M» B llllh ; which is an exponential 
family, the divergence is calculated as 



D(Poi(/Z)||Poi(A0) 

m m 

= E(^ - M. N ' x ~ 



, m* 

Mi log — 

Mi 



(30) 



m mm 

<EM9-(X>xX>) lo e 4=1 * 



»=i 



Em 1 



i = x 2^i=i Mi 2_a=i Mi 



(31) 



where D(p\\p') is the divergence between the multinomial 
distributions p and p' . 

When the hypothesis is given by (|23|) and n ™+ n #0, 



we have 



log 



sup e - o£eo P e - o (ni,n 2 ) 



sup^^P^Cni,^) 
=(n x +n 2 )D(P_^||P eo ) = £(P n \+" 2 ||P e n 1+ " 2 ), (32) 

where Pg is the binomial distribution with one observa- 
tion and Pg is the binomial distribution with n obser- 
vations. Then, the likelihood ratio test is given by the 
likelihood ratio test of the binomial distributions. 
In the following, we treat two hypotheses given as 



Hq ■ w ■ 9 > c versus H\ : w ■ 9 < c , 



(33) 



with the condition Wi > 0, Using the formula (|30[1 . and 
(|33[1 , we can calculate the likelihood ratio test for a given 



ratio r. Now, we calculate the p-value concerning the 
class of likelihood ratio tests when we observe the data 
ki, . . . , k m . When w ■ k < cq, this p-value is equal to 



max PoiO')(-/4 fl ffcO, 

W-fl —CQ y 1 



(34) 



where 



Ai 



R(k) 



E 

i=l 

w ■ k' < cq 



mm 

W-pi—CQ 



(Mi - K) 



E 

i=l 



K \og—>R 

Mi 



W-fJ, — Cq 



E 

i=l 



(Mi - h) + 



E 

i=l 



ki log — 

M* 



because the minimum P satisfying fc £ is R(K). 
Since the calculation of l|34|) is not so easy, we consider 
its upper bound. For this purpose, we define the set Br 
as 



Br 



k' 



E 



k' 



where Mi(P) are defined as follows: 



< 1 



(35) 



^-M,(P) + M,(P)log^^ i =P ifP<Po,, (36) 
Wi c 



CO , - / dm W M - 

h/A,(P) log 

W M W M 



w, 



R if P > Po,i, 



(37) 



where u?jvf := max^ Wj and Po,i := ■^ a - + 
co(^m-^) log tflM-ic^ Note that nim is a monotone 
decreasing function of R. As is shown in Appendix IDI 

Ar C Br. (38) 

Then, the p-value concerning likelihood ratio tests is up- 
pcrly bounded by 



.max Poi(M J ')(^ ( fe ) ) 



(39) 



However, it is difficult to choose the likelihood r such 
that the p-value is equal to a given risk probability a 
because the set Ar is defined by a non-linear constraint. 
In order to resolve this problem, we propose to modify 
the likelihood ratio test by using the set Br instead of the 
set Ar, because Br, is defined by a linear constraint while 
Ar is by a non-linear constraint. That is, we define the 
modified test mo d,fl as the test with the rejection region 
Br. Among this kind of tests, we can choose the test 
(frmodjRo, with the risk probability a by choosing R a in 
the following way: 



max Poi(fI')(BR o 

w-£l' —cq 



(40) 



Indeed, the calculation of the probability Poi(M*')(^Ji) I s 
easier than that of the probability Poi(M^')(^i?) because 
of the linearity of the constraint condition of Br. 
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Next, we calculate the p- value of the set of the modified 
tests {<fimod,a}a- F° r an observed data k, we choose R'(k) 
as R' satisfying 



of the normal distribution family with the covariance ma- 
trix (nJg) -1 , where the Fisher information matrix Jg-i.j 
is given by 



En 



hj 



= 1. 



(41) 



The LHS is monotone increasing for R' because each 
Pi(R') is monotone decreasing for R' . Thus, R'(k) is 



the maximum R' such that k £ B^ 
value is equal to max t n.^/ =Co Poi(p')(B R ,^) 

the relation (|38|l implies k e B 



R(k)- 
'R(k) ^ ^R'(k)- 



Then, the p- 
Further, 

Hence, R(k) < 

Therefore, the 



R'{k), which implies B 
p-value max^.^/ =Co Poi(p')(B R ,^) concerning the mod- 
ified tests {4>mod, a}a is smaller than the upper bound 
max^.£/ =Co Poi(p')(B R ^) of p-value concerning the like- 
lihood ratio tests. This test mo d coincides with the like- 
lihood ratio test in the one-parameter case. 

V. ASYMPTOTIC THEORY 

A. Fisher information 

Assume that the data Xi, . . . ,x n obeys the identical 
and independent distribution of the same distribution 
family pg and n is sufficiently large. When the true pa- 
rameter 9 is close to 9q, it is known that the meaning- 
ful information for 9 is essentially given as the random 
variable — Y^i=\ h { x i)i where the logarithmic derivative 
lg Q (xi) is defined by 



l 9 (x) 



d\ogp e (x) 
d9 



(42) 



In this case, the random variable — Yl^—i l>6 ( x i) can be 
approximated by the normal distribution with the expec- 
tation value 9—9q and the variance —4— , where the Fisher 

information Jg is defined as Jg :— j \lg(x)) 2 Pg(dx). 
Hence, the testing problem can be approximated by the 
testing of this normal distribution family 0, 0] . That 
is, the quality of testing is approximately evaluated by 
the Fisher information Jg Q at the threshold 9o. 

In the case of Poisson distribution family Poi(#i), the 
parameter 9 can be estimated by y . The asymptotic case 
corresponds to the case with large t. In this case, Fisher 
information is 4. When X obeys the unknown Poisson 
distribution family Poi(#t), the estimation error ^ — 9 is 
close to the normal distribution with the variance |, i.e., 
V^Ct" — Q) approaches to the random variables obeying 
the normal distribution with variance 9. That is, Fisher 
information corresponds to the inverse of variance of the 
estimator. 

This approximation can be extended to the multi- 
parameter case {pg\ 9 £ R" 1 }. Similarly, it is known that 
the testing problem can be approximated by the testing 



J0;i,j ■= J lg-i{x)lg-j(x)Pg{dx), 
d log pg(x) 



<•(*) 



(43) 
(44) 



When the hypotheses is given by the testing prob- 
lem can be approximated by the testing of the normal 

distribution family with variance 



w-J„ 



Indeed, the same fact holds for the multinomial Pois- 
son distribution family Poi(tp). When the random vari- 
able Xj is the i-th random variable, the random variable 
Sj=i ~!/t(Xj— Pj) converges to the random variable obey- 
ing the normal distribution with the variance X^JLi ^j^j 
in distribution: 



■lib * 



i=i 



(45) 



This convergence is compact uniform concerning the pa- 
rameter p. In this case, the Fisher information matrix 
J M is the diagonal matrix with the diagonal elements 
• •• j 7T - )- When our distribution family is given as 
a subfamily Poi(tpi(9), . . . , tp m (9)), the Fisher informa- 
tion matrix is AgJ^g^Ag, where Ag-^j = Hence, 
when the hypotheses is given by the testing prob- 

lem can be approximated by the testing of the normal 
distribution family with variance 



w ■ (AgJ^g)Ag) 1 w. 



(46) 



In the following, we call this value Fisher information. 
Based on this value, the quality can be compared when 
we have several testing schemes. 



B. Multi-parametric Poisson distribution 

In the following, we treat testing of the hypothesis (|33|) 
in the multinomial Poisson distribution Poi(/2) by using 
normal approximation. In this case, by using pi defined 
in (|5oT) and lET7|) . the upper bound of the p-value 
concerning the likelihood ratio tests is approximated to 



max $ 

Wfl' = C 



1 _ v m - 



w(fl(fc)) 



Em 
i=l J, 



=<!> 



i - 



\ 



max 

W • \JL ! —Cq 



Em 
i=l 
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because this convergence 1)450 is compact uniform con- where 
cerning the parameter p,. Letting Xi(R) = ^ — 1 

and y 4 (i?) = W J° {R) 2 , we have 



max 



- ^ j=1 = max ^, (47) 

t[ (x,y)eCo(R(k)) y/V 



Em 
t=l 



MAW) 2 



where Co(R) is the convex hull of 
(zi(fl)>yi(-H)),---,(z m (.R),3/ m (iZ)). As is shown in 
Appendix El this value is simplified to 



Zij(R) :-- 



M3 

y/v^M 

Zi,j(R) 



if 



2x j (R) Vi (R) 



> 1 



x i (R)y i (R)+x i (R)y j (R) 

■ r 2xdR)y 3 {R) > n 

x ] (R)y j (R)+x j (R)y t (R) 

otherwise, 



(49) 



min z i j(R(k)), 



(48) 



where 



Zij(R) := 



2(x4^-(i?)(y4fi) + y 3 (i?))-^(fl) 2 y J (i?)- a;3 (i 1 ;) 2 ^(i?)) 
V^(S) - Xj(R))(yi(R) ~ y ] {R))^x l {R)y 1 {Ry + Xj (R) yi (R) 2 - y t {R)y 3 {R){x,{R) + Xj (R)) 



(50) 



r 



That is, our upper bound of p-value concerning the like- 
lihood ratio tests is given by 



$(— mm z i} j(R(k))). 



(51) 



Next, we approximately calculate the test with the risk 
probability a proposed in sectior llV Dl First, we choose 
R a by 



min Zi j (R a ) = $ 1 (a) . 



(52) 



Then, our test is given by the rejection region Br . Using 
the same discussion, the p-value concerning the proposed 
tests is equal to 



$(-min Zij(R'(k))). 



VI. MODIFICATION OF VISIBILITY 



(53) 



In the following sections, we apply the discussions in 
sections lTTTl -IVlto the hypothesis (J3J. That is, we consider 
how to reject the null hypothesis Hq : F < Fq with a 
certain risk probability a. 

In the usual visibility, we usually measure the coin- 
cidence events only in the one direction or two direc- 
tions. However, in this method, the number of the counts 
of coincidence events be reflected not only by the fi- 
delity but also by the direction of difference between the 
true state of target maximally entangled state. In order 
to remove the bias based on such a direction, we pro- 
pose to measure the counts of the coincidence vectors 
\HH),\VV),\DD),\XX),\RL), and \LR), which corre- 
sponds to the coincidence events, and the counts of the 



anti-coincidence vectors \HV), \VH), \DX), \XD), \RR), 
and \LL), which corresponds to the anti-coincidence 
events. The former corresponds to the the minimum 
values in the usual visibility, and the later does to the 
minimum values in the usual visibility. In this paper, we 
call this proposed method the modified visibility method. 
Using this method, we can test the fidelity between the 
maximally entangled state |$(+))($(+) | and the given 
state cr, using the total number of counts of the coin- 
cidence events (the total count on coincidence event) n\ 
and the total number of counts of the anti-coincidence 
events (the total count on anti-coincidence events) ni 
obtained by measuring on all the vectors with the time 
■A. When the dark count is negligible, the total count on 
coincidence events n\ obeys Poi(A 2 ^ 1 t), and the count 
on total anti-coincidence events n 2 obeys the distribu- 
tion Poi(A 2 ~ 2F t). These expectation values \i\ and \ii 
are given as \i\ = A 2F ^ t and fi2 = A 2 ~ 2F t. Hence, 
Fisher information matrix concerning the parameters F 
and A is 



3(2F+1) ~ 3(2-2F) ■ 









(54) 



where the first element corresponds to the parameter F 
and the second one does to the parameter A. Then, we 
can apply the test </>lr given in the end of subsection 
HVTI That is, based on the ratio = |(1 - F), we 



estimate the fidelity using the ratio 



ni+n2 



as F(n\, 112) 



Based on the discussion in subsection IV Al 



1 _ A n 2 

2 n 1 +n 2 \ 

its variance is asymptotically equal to 



A( 



3(2F+1) ~ 3(2-2F) 



(2F+1)(2-2F) 

xt ■ 



(55) 
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Hence, similarly to the visibility, we can check the fidelity 
by using this ratio. 

Indeed, when we consider the distribution under the 
condition that the total count n\ + n% is fixed to n, 
the random variable n 2 obeys the binomial distribution 
with the average value |(1 — F)n. Hence, we can apply 
the likelihood ratio test of the binomial distribution. In 
this case, by the approximation to the normal distribu- 
tion, the likelihood ratio test with the risk probability 
a is almost equal to the test with the rejection region: 

{{m^)\^- 2 < id - F ) + *-H«)y/ ^!£S*i } 

concerning the null hypothesis Ho : F < Fq . The p- value 
of this kind of tests is $( M2F 0+ i)-n l( 2-2F ) y 

y x /(ni+n 2 )(2F +l)(2-2F ) ' 



VII. DESIGN I (A: UNKNOWN, ONE STAGE) 

In this section, we consider the problem of test- 
ing the fidelity between the maximally entangled state 
|$(+))($(+) | and the given state a by performing three 
kinds of measurement, coincidence, anti-coincidence, 
and total flux, with the times t\,t% and i 3 , respec- 
tively. When the dark count is negligible, the data 
(711,712,713) obeys the multinomial Poisson distribution 
Poi(A 2F 6 +1 £i, \ 2 ~^ F t 2 , ^3) with the assumption that the 
parameter A is unknown. In this problem, it is natural 
to assume that we can select the time allocation with the 
constraint for the total time t\ + t 2 + £3 = t. 

The performance of the time allocation {tx,t 2 ,t^) can 
evaluated by the variance (|46|) . The Fisher information 
matrix concerning the parameters F and A is 



2/ 1 



21-2 



3(2F+1) 1 3(2-2F) 
3 



tl-*2 



Z£+± tl + ^t 2 +t 3 > ( 56 ) 



where the first element corresponds to the parameter F 
and the second one does to the parameter A. Then, the 
asymptotic variance (|46(l is calculated as 



2F+1 . , 2-F + , -i 

— 6 — 1 ' — e> 2 ' 3 



2t2 j _ ( t i _^ )2 



3(2-2F) 



(57) 



We optimize the time allocation by minimizing the vari- 
ance (|57fl . We perform the minimization by maximizing 



2tx 



2t 2 



(^) 2 



nni ' r>v: A 1 3(2F+1) ^ 3(2-2F) ££^± tl + ^^t 2 +t 3 

Applying Lemmas ^ and [21 shown in Appendix 1X1 to the 



case of a = 
we obtain 



3(2F+ 



17. b = 



2F+1 



3(2-2F) ' 



d = 



2-2F 



(i) 



A max 



2ti 



2Xt 



*i+t 3 =t 3(2F + 1) 



2F^ 



(ii) 



A max 



2t 2 (|) 2 



2Xt 



2 +t 3 =t 3(2 - 2F) 



2-2F 



h + h 



3(2 -2F)(1 + ^^£) 2 
(59) 



and 



(iii) A max 



2*i 



2U 



(*i=*a)2 



fe+S^t 3(2F + 1) ' 3(2 -2F) M±l tl + 



\(! / 2-2F I 1 /2F+T\2j- 
A V3y 2F+1 " r 3Y 2-2F^ 1 

6Xt 



(2F + 1)(2 - 2F)(v / 2F+l + V2-2F) 2 ' 



(60) 



Then, these relations give the optimal time allocations 
between (i) coincidence and total flux measurements, (ii) 
anti-coincidence and total flux measurements, and (iii) 
coincidence and anti-coincidence measurements, respec- 
tively. The ratio of l(()U|) to (|5gj) is equal to 



3(V6 + y/2F + l) 



2(2 - 2F){y/2F+ 1 + y/2 - 2F) 2 



> 1, 



(61) 



as shown in Appendix [51 That is, the optimal measure- 
ment using the coincidence and the anti-coincidence al- 
ways provides better test than that using the coincidence 
and the total flux. Hence, we compare (ii) with (iii) , and 
obtain 



max A( 



2ii 



2t 2 



t 1+ t 2 +t 3 =t V 3(2F +1) 3(2 - 2F) 



(^) 2 



2F+U i 2-2F + , f 
— g — 1 ' 6 — 2 ' 3 



4At 

(2-2F)(\/6+v / 2-2F) 2 
6 At 



(2F+l)(2-2F)(V2F+l+v / 2-2F) 2 



if Fx < F < 1 
if <F< Fx, 



where the critical point Fx < 1 is defined by 



2{2Fx + l)(V2Fi + 1 + V2 - 2Fx) 
3(v / 6 + %/2 _ 27T) 2 



(62) 



(63) 



The approximated value of the critical point Fx is 
0.899519. The equation (j^ is derived in Appendix 

Fig. [3 shows the ratio of the optimal Fisher informa- 
tion based on the anti-coincidence and total flux mea- 
surements to that based on the coincidence and anti- 
coincidence measurements. When Fx < F < 1, the 
maximum Fisher information is attained by tx = 0, 

1, t 3 = y 2{1 rZEL- t. Otherwise, 



3 3(2F + 1)(1 + 



2F+1 
6 



_ t2 (V6+y/2(l-F)) "' " 3 V6+y/2(l-F) 

) 2 the maximum is attained by tx 



V2-2F 



ft, to 



(58) 



V2F+1 



V2F+1 + V/2-2F ' 



V2F+1 + V2-2F 

0. The optimal time allocation 
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0.4 0.6 
fidelity 



0.8 ' 1.0 
0.899519 



the hypothesis testing ©. Assume that the threshold 
Fq is less than the critical point F\ . In this case, we can 
apply testing of the hypothesis if^jl . First, we measure 
the count on the coincidence vectors for a period of t\ = 
/orn tv/ ?7 2 /^" nr , to obtain the total count n\. Then, we 
measure the count on the anti-coincidence vectors for a 



period of ti 



W2F a + l 



to obtain the total count 



V2F + 1 + V2-2F 

ri 2 . Note that the optimal time allocation depends on the 
threshold of our hypothesis. Finally, we apply the UMP 
test of a of the hypothesis: 



H Q :p> 



V2-2Fo 



\/2-2Fo + \/l+2Fo 



versus H± : p < 



V2-2Fo 



V2-2F +v'l+2Fo 



FIG. 2: The ratio of the optimal Fisher information (solid 
line) and the optimal time allocation as a function of the fi- 
delity F. The measurement time is divided into three periods: 
coincidence t\ (plus signs), anti-coincidence ti (circles), and 
total flux tz (squares) , which are normalized as ti +t2 +tz = 1 
in the plot. 



shown in Fig. [5] implies that we should measure the 
counts on the anti-coincidence vectors preferentially over 
other vectors. 

The optimal asymptotic variance is 

(2F + l)(2~2F)( 6 CT + ^+2F) 2 when ^ threshold Fq 

is less than the critical point F\. This asymptotic 
variance is much better than that obtained by the 
modified visibility method. The ratio of the optimal 
asymptotic variance is given by 



(y/2 - 2F + s/l + 2Ff 

6 K ' 



(64) 



In the following, we give the optimal test of level a in 



with the binomial distribution family P™ 1+ ™ 2 to the data 
m. In this case, the likelihood ratio test with the 
risk probability a is almost equal to the test with the 
rejection region: {( ni ,n2)|^_ < ^ 2F ^^_ 2Fq + 

vto^ v^B!^} co ™g thc - 11 ^ 

pothesis H : F < F . The p- value of this kind of tests 

• s g,/ n 2 V2W+l-n 1 s/2-2F \ 
V(«i+«2)v / 2F^+T V '2-2Fo '' 

We can apply a similar testing for Fq > F\. It is 
sufficient to replace the time allocation to t\ = t<i = 

— l i± , t 3 = *^ 2( i— ^L . In this case, the likeli- 

y/&+y/2{l-F ) V6+ v /2(l-F ) 

hood ratio test with the risk probability a is almost equal 
to the test with the rejection region: {(ri2, 77.3) | n ™+ n < 

Z 1 ^^ + *"7 a) t , concerning the null 

hypothesis Ho : F < Fo- The p- value of this kind of tests 

J g t^/ n 2 V3-n 3 Vl-Fp \ 

V( n 2+ n 3)vT=Fov / 3 
Next, we consider the case where the dark count pa- 
rameter S is known but is not negligible, the Fisher in- 
formation matrix is given by 



2A*i 



2At 2 



A(2F+1) 



^3(A(2F+l)+6<5) 1 3(\(2-2F)+68) ' 3(A(2F+l)+6<5) 
A(2F+1) , A(2-2F) , 2F+1 2F+1 

3(A(2F+l)+6<5) 11 3(A(2-2F)+6<5) 2 A(2F+l)+6<5 



tl + 



A(2-2F) 
3(A(2-2F)+6(5) 
2-2F 



2-2F . , JU 

A(2-2F)+6<5 6 2 A 3 



(65) 



Hence, from (|46ll . the inverse of the minimum variance is 
equal to 



f(h,t 2 ,t 3 ) 

2Atx 



2Xh 



3(A(2F + 1) + 65) 3(A(2 - 2F) + 6(5) 



A(2F+1) 
3(A(2F+l)+65) 



h - 



A(2-2F) 
3(A(2-2F)+6<5) 



*2) 2 



A(2F+1) 2F+1 



ti 



A(2-2F) 2-2F 



h + h 



A(2F+l)+6<5 6 1 T A(2-2F)+6<5 

Then, we apply Lemmas ^ and [5] in Appendix ^ to 

b = 



/(tl \ t2 '* 3) with a = 



2 A 



2 A 



3(A(2F+l)+6<5) • 



3(A(2-2F)+6<5) ' 



A(2F+1) 2F+1 
A(2F+l)+6<5 6 

the optimized value: 



A(2-2F) 2-2F 
A(2-2F)+65 6 ' 



(i) coincidence and total flux 



max /(ii,0,f 3 ) 

tl+t^—t 



((2F + 1) 



6(A(2F+l)+6<5) 
A ) 



(66) 
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(ii) anti-coincidence and total flux and 

4Ai 



max f(0,t 2 ,t 3 ) = 

t2+t 3 =t ((2 2F) I / 6 ( A ( 2 - 2J? )+ 6 ^) )2 

(67) 



(iii) coincidence and anti-coincidence 



The ratio of (JHBJ) to (JHSJ) is 



max /(ii, #2) 0) =Ai 



A(2F+1) 



A(2-2F) 



3 A /(A(2_F+l)+6<5)(A(2-2_F)+65) 3 A /(A(2F+l)+6<5)(A(2-2_F)+6<5) 



V 



(2F + i; 



6(A(2F+l)+65) 



(2 - 2F) 



6(A(2-2F)+65) 



2XH 



2-2F 



'3(A(2F + 1) + 6<S)(A(2 - 2F) + 66) , 2F + 1 + 

\ % /A(2F+l)+65 ^A(2-2F)+6<5 

6A^ 

((2F + l) v /\(2-2F) + 65+(2 - 2F) V / X(2F + 1) + 66^' 



(68) 



3A (2F + 1) 



6(A(2F+l)+6<5) 
A 



2 ((2F + 1)a/A(2 - 2F) + 65 +{2 - 2F) V / \(2F + 1) + 6~sY 

3 / (2F + 1) VA + ^6(A(2F + 1) + 6<5) ' 

"2 ^ (2F + 1)a/A(2 - 2F) + 65 + (2 - 2F)y/X(2F + 1) + 65 / 

I 



> 1, 



(69) 



where the final inequality is derived in Appendix [5] 
Therefore, the measurement using the coincidence and 
the anti-coincidence provides better test than that using 
the coincidence and the total flux, as in the case of 5 = 0. 

Define Si and the critical point Fgr for the normalized 
dark count S' = 6S/X < Si as 



y/Si + 3 



Si = y/3/2 

sfl + 2F S > + 5' - V2 - 2F S > + 6' = ^/3/2. 



The parameter Si is calculated to be 0.375. As shown in 
Appendix El the measurement using the coincidence and 
the anti-coincidence provides better test than that using 
the anti-coincidence and the total flux, if the fidelity is 



smaller than the critical point F$> : 



max f(h,t 2 ,t 3 ) 

r l+ t 2+ t 3 — 1 

4A 2 t 



((2-2F)%/A+^/6(A(2-2F)+6<5)) 2 
6A 2 t 



((2F+l) A /A(2-2F)+6i + (2-2F)^A(2F+l)+65) ' 



if F > Fs> 
otherwise. 



The 



h = 
ta = 
and t\ 

t 2 = 



optimal 
0, t 2 



time allocation is given 

t v /6(A(2-2F)+6<5) 



t(2-2F)VA 



^/6(A(2-2F)+6<5) + (2-2F)\/A ' 



(70) 
by 

and 



A /6(A(2-2F)+6<5) + (2-2F)\/A 



for F > F s >, 



t(2-2F)^/A(2F+l)+6i5 



(2-2F)-y/A(2F+l)+6<5+(2F+l)- v /A(2-2F)+65 ' 

= 



t(2F+l) A /A(2-2F)+6i5 



(2-2F)^/A(2F+l)+6«5+(2F+l) % /A(2-2F)+65 

for F < Fs>. The critical point F$i for optimal time 
allocation increases with the normalized dark count as 
illustrated in Fig. [3J 
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0.1 0.2 0.3 0.375 

Normalized dark counts 



FIG. 3: The critical point Fgi for optimal time allocation as 
a function of normalized dark counts S' . 



VIII. DESIGN II (A: KNOWN, ONE STAGE) 

In this section, we consider the case where A is known. 
Then, the Fisher information is 



2Xt 1 



2XU 



'3(A(2F+ 1) + 66) 3(A(2 - 2F) + 66) 
The maximum value is calculated as 



)■ (71) 



max 

t!+t 2 +t 3 =t* 



3(A(2F+l)+6<5) 

2A 2 t 
3(A(2-2F)+6<5) 



if F< i 



(72) 



The above optimization shows that when F > ^, 
the count on anti-coincidence (ti — 0;t2 — t;t§ — 
0) is better than the count on coincidence (t\ — 
t;t 2 = 0;<3 = 0). In fact, Barbieri et aZ.0 measured 
the sum of the counts on the anti-coincidence vectors 
\HV),\VH),\DX),\XD),\RR),\LL) to realize the en- 
tanglement witness in their experiment. In this case, 
the variance is 3 ( A ( 2 ~ 2F)+6<5) ^ when we observe the sum 
of counts on anti-coincidence n 2 , the estimated value 
of F is given by 1 + 3(6 — which is the solution 



of (A 



2-2F 



6)t = ri2- The likelihood ratio test with 



the risk probability a can be approximated by the test 
with the rejection region: {?t,2 |ri2 < 3^° + + 

$- 1 (a)^/( X( - 1 ~ F °') + 6)t} concerning the null hypothesis 
H : F < F , which is also the UMP test. The p- value of 

H5)tx 



likelihood ratio tests is $( 



l2-( 



A(l--Fp) 



■)• 



/ ( Mi_£o) +(5)t 

When F < j, the optimal time allocation is t\ = t, 
ti = tn = 0. The fidelity is estimated by ^ - |. 



2 — 1-3 

Its variance is 3 ( x ( 2 ^+^+ 6S ) _ The likelihood ratio test 
with the risk probability a of the Poisson distribution 



is almost equal to the test with the rejection region: 

> (A±±jp + 6)t + $-1(1 - a )^(\ 1 -±l^ + 6)i} 

concerning the null hypothesis Hg : F < Fq, which is 
also the UMP test. The p- value of likelihood ratio tests 



'(A 



-S)i 



IX. 



COMPARISON OF THE ASYMPTOTIC 
VARIANCES 



We compare the asymptotic variances of the follow- 
ing designs for time allocation, when the dark count 6 
parameter is zero. 



(i) Modified visibility: 

(2F+1)(2-2F) 
At 



The asymptotic variance is 



(iia) Design I (A unknown). optimal time allo- 
cation between the counts on anti-coincidence 
and coincidence: The asymptotic variance is 

(2F+1)(2-2F)(V2F+1+V2-2F) 2 
6 At 

(iib) Design I (A unknown), optimal time alloca- 
tion between the counts on anti-coincidence and 
the total flux: The asymptotic variance is 

(2-2F)(y/6+V2-2F) 2 
4At 

(iiia) Design II (A known), estimation from the count 
on anti-coincidence: The asymptotic variance is 

3(2-2F) 
2At 

(iiib) Design II (A known), estimation from the count on 
coincidence: The asymptotic variance is 3( - 2 2 J ^~ 1 ^ . 

Fig. 0] shows the comparison, where the asymptotic vari- 
ances in (iia)- (iiib) are normalized by the one in (i). The 
anti-coincidence measurement provides the best estima- 
tion for high (F > 0.25) fidelity. When A is unknown, 
the measurement with the counts on anti-coincidence and 
the coincidence is better than that with the counts anti- 
coincidence and the total flux for F < 0.899519. For 
higher fidelity, the counts on anti-coincidence and total 
flux turns to be better, but the difference is small. 



X. DESIGN III (A: KNOWN, TWO STAGE) 

A. Optimal Allocation 

The comparison in the previous section shows that the 
measurement on the anti-coincidence vectors yields a bet- 
ter variance than the measurement on the coincidence 
vectors, when the fidelity is greater than 1/4 and the pa- 
rameters A and 6 are known. We will explore further im- 
provement in the measurement on the anti-coincidence 
vectors. In the previous sections, we allocate an equal 
time to the measurement on each of the anti-coincidence 
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FIG. 4: Comparison of the designs for time allocation. The 
asymptotic variances normalized by the value of modified vis- 
ibility method are shown as a function of fidelity, where dots: 
(iia), solid: (hb), thick: (iiia), and dash: (iiib). 



vectors. Here we minimize the variance by optimizing 
the time allocation tnv, Wh, tDX, txD> t.RR) and tLL 
between the anti-coincidence vectors B = {\HV), \VH), 
\DX), \XD), \RR), and \LL)}, under the restriction of 
the total measurement time: Yl(x,y)eB ^%,y = The 
number of the counts n xy obeys Poisson distribution 
Poi((A/i X j, + S)t xy ) with unknown parameter fi xy . Then, 
the Fisher information matrix is the diagonal matrix with 

A t x 



the diagonal elements { \^ lx +8 }(x,y)eB Since we are in- 
terested in the parameter 1 — F = ^(J2(x y)eB ^x,y)i the 
variance is given by 



(x,y)GB 



X 2 t x . 



(73) 



as mentioned in section lV Al Under the restriction of the 
total measurement time, the minimum value of (|73ll is 



(E 



(x,y)GB 



4XH 

which is attained by the optimal time allocation 

{^J^xy + S)t 



t 



J2{x',y')eB \/^x',y' + $ 



(74) 



(75) 



which is called Neyman allocation and is used in sampling 
design 13j]. The variance with the equal allocation is 



3(A(2-2F) 
2XH 



H x (J2(x, y )eB^,y) + 6S ) 



2\H 



(76) 



The inequality (J^J < d can be derived from 
Schwartz's inequality of the vectors (!,...,!) and 



(VVffV + 5, ■ • • 5 VV-LL + $)■ 

only if n HV = ^ VH = fi DX 



The equality holds if and 

MXD = HRR = [I'LL- 

Therefore, the Neyman allocation has an advantage over 
the equal allocation, when there is a bias in the parame- 
ters fiHV,^VH,^DX,^XD,^RR, Vll- In other words, the 
Neyman allocation is effective when the expectation val- 
ues of the counts on some vectors are larger than those 
on other vectors. 



B. Two-stage Method 

The optimal time allocation derived above is not ap- 
plicable in the experiment, because it depends on the 
unknown parameters (Xhv, Hvh, Hdx, Mxd, Hrr, and 
Ull- In order to resolve this problem, we introduce a 
two-stage method, where the total measurement time t 
is divided into t / for the first stage and t s for the second 
stage under the condition of t = tf +t s . In the first stage, 
we measure the counts on each vectors for t f /6 and es- 
timate the expectation value for Neyman allocation on 
measurement time t s . In the second stage, we measure 
the counts on a vector \xaUb) according to the estimated 
Neyman allocation. The two-stage method is formulated 
as follows. 

(i) The measurement time for each vector in the first 
stage is given by tf/6 

(ii) In the second stage, we measure the counts on a vec- 
tor \xaUb) with the measurement time t xy defined as 



txy 



' xy 



^2(x,y)£B \/ m xy 



(* - */) 



where m xy is the observed count in the first stage, 
(iii) Define fi xy and F as 



fJ-xy 



/\L Xy 



F = 1 - 



1 E 

2 ^ 

(x,y)£B 



Hx,y, 



where n x , y is the number of the counts on \xaUb) for t xy . 
Then, we can estimate the fidelity by F. 
(iv) Finally, we apply the test </> mo d,ct given in Section 
HVDI to the two hypotheses given as 

Hq : w ■ fl > Co versus H\ : w ■ /2 < cq, (77) 

where w x , y := ^j- and c := 1 - F . 



XI. CONCLUSION 

We have formulated the hypothesis testing scheme to 
test the entanglement in the Poisson distribution frame- 
work. Our statistical method can handle the fluctuation 
in the experimental data more properly in a realistic set- 
ting. It has been shown that the optimal time allocation 
improves the test: the measurement time should be al- 
located preferably to the anti-coincidence vectors. This 
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test is valid even if the dark count exists. This design 
is particularly useful for the experimental test, because 
the optimal time allocation depends only on the thresh- 
old of the test. We don't need any further information 
of the probability distribution and the tested state. The 
test can be further improved by optimizing time alloca- 
tion between the anti-coincidence vectors, when the error 
from the maximally entangled state is anisotropic. How- 
ever, this time allocation requires the expectation values 
on the counts on coincidence, so that we need to apply 
the two stage method. 



Proof: Letting x := ct\ + dt 2 , we have t\ — ^jzf and 
t2 = *F§. Then, 



aii + bt 2 



[\/act-L - Vbct 2 ) 
eti + dt 2 



ad+Vbc\ ( . cdt 2 
(c + d)t — x 



d — c 



Hence, the maximum is attained at x = V cdt, i.e., t\ = 



tVd 

\fc+Vd 



and t 2 



. Thus, 
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APPENDIX A: OPTIMIZATION OF FISHER 
INFORMATION 

In this section, we maximize the quantities appearing 
in Fisher information. 

Lemma 1 The equation 

act 2 at 
max at 2 = — -= —77 (All 

ti,t 3 >0, cii+t 3 =t cti+t 3 (vc+1) 2 

holds and the maximum value is attained when t\ = 
t . _ Vet 

Proof: Letting x :— ct\ + 1 3 , we have ti — Then, 



ati 



acti 



eti + t 3 (c - iy 



ct 2 , 
-x + (c+l)t 

X 



Hence, the maximum is attained at x = \fct, i.e., t\ = 
^andia-^fr. Thus, 



acti 

max at 1 

*1,*3>0, ct 1 +t 3 =t ct\+t 3 

a I ~ r- / x \ a t 

__(-2va + (c+i)t) = ? ^ TI15 . 



(c-l) 
Lemma 2 The equation 



max at 1 + bt 2 

ti,t 2 >0, ti+t 2 =t 



{y/acti - Vb~ct 2 f 
ct\ + dt 2 



t(Vad + Vbc) 2 
(v^ + v^) 2 



(A2) 



holds, and this maximum value is attained when t\ = 
tVd x _ ty~c 

Vc'+Vd' 2 Vc+Vd' 



max at\ + bt 2 
ti,t 2 >o, t 1 +t 2 =t 



(y/acti — Vbdt 2 ) 



Hid + y/bc 

d- 



j ((c + d)t - 2\fcdt} = 



ct\ + dt 2 

\ _ t(\/ad + Vbc) 2 
' (V~c+Vd) 2 



Further, three-parameter case can be maximized as fol- 
lows. 

Lemma 3 The maximum value 

. , , + {y/aebi - Vbdt 2 ) 2 
max ati + bt 2 ; 

ti,t 2 ,t 3 >o, t L +t 2 +t 3 =t eti + dt 2 + t 3 

is equal to the maximum among three values 

act 2 



max at 2 

tl,t 3 >0, cii+t 3 =t 



bdt\ 



max at 2 

Ct\ + t 3 t 2 ,t 3 >0, ct 2 +t 3 =t 

{y/act-i - Vbdt 2 ) 2 



dt 2 + t 3 t lt t 2 >o, t 1 +t 2 =t 



max ati + bt 2 — 



cti + dt 2 



Proof: Define two parameters x :— ct\ + dt 2 + t 3 and 
y := \fcdt\ — \fbdt 2 . Then, the range of x and y forms a 
convex set. Since 



ti = 

t 2 = 



bd(x -t) + (d- l)y 



bd{c-l) + y/ac{d-iy 
ac(x — t) — (c — l)y 



Hence, 



ati + bt 2 - 



y/ac~{c - 1) + Vbd{d - 1) 
(y/acti - 



eti + dt 2 + t 3 
a\/bd 



\/bd{c - 1) + y/ac(d - 1) 
b\fac 



+ 



+ 



y/ac(c 


-1) + 


Vbd(d- 


■1) 




a(d — 


1) 




Vbd(c- 


-1) + - 


J~ac(d — 


1) 




b{c- 


1) 




s/ac{c 


-1) + 


Vbd{d- 


•1) 



(x-t) 



X 



-(y -]-Bx) 2 + + A)x - At, 
x 2 4 
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where A 



ay/bd 



byfac. 



B 



V / 53(c-l)+Voc(d-l) v /ac(c-l) + Vhd(d-l) 

VbS(c-i)+%{d-D - v^(Ji)+%( d -i) • A pp!y in g Lemma 

we obtain this lemma. ■ 

Lemma 4 Define the function f(x,y) :— ~-(y — ax) 2 + 
fix on a closed convex set C. The maximum value is 
realized at the boundary bdC . 

Proof: The condition can be classified to two cases: i) 
bdC n {y = ax} = 0, ii) bdC n {y = ax} ^ 0. In the 
case i), when fix x is fixed, maXy.( Xt y)£C f( x > V) — 
maXy : ( Xi y) ebd c f(x,y). Then, we obtain 

max (x, y )6C f( x , y) = m ^(x, y )ebdc f( x , y)- In the case ii), 
when (x,ax) G C, max~y.( XtV \ e c f{ x -> V) — f{ x i ax ) — P x - 
Hence, max x , {x ^ ax)eC max y .^ x ^ eC f (x,y) = 
rnaxj.^ . ax )ec P x This maximum is attained at 
x = max^jxl : (x,ax) G C} or x = min^jxl : 
(x,ax) G C}. These point belongs to the boundary 
bdC. Further, ma,x x .^ >ax ^ c max v.(x,y)eC f( x > v) = 
maxx:{x,a,x)ecm&xy:(x, y )ebdc f( x ,y)- Thus, the proof is 
completed. ■ 



APPENDIX B: PROOF OF INEQUALITIES ^ 
AND 11531) 

It is sufficient to show 

7| ((2F + 1) VA + v/6(A(2F+l) + 6<J)) 
- ((2F + l)y/X(2-2F)+66 



(2 - 2F)VA(2F + l) + 6(5) > 0. 



(Bl) 



By putting 8' := the LHS is evaluated as 
LHS of JH) 

7a 

(2F + 1) + 3y/(2F+l) + 6>) 
- (2F + i)y/(2-2F) + 6'- (2 - 2F)y/(2F+l) 
1 3 (2F + 1) + (2F + l)V(2F+l) + <5') 



- (2F + l)V(2-2F) + <5' 



(2F+1) ( + V(2F + 1) + ^')- V(2-2F) + , 



Since < F < 1, we have 



- + y/{2F + 1) + 5') - V(2 - 2F) + 5' 



>J- + VT+V -V2 + W. 



Further, the function 6' -> VTT? 7 - V2+T (5' G [0, oo]) 
has the minimum \/l — V2 > — 1 > — \/§ & t ^' = 0- 
H cnC e, LHS of {Bit >Q 

VA 

APPENDIX C: PROOF OF EQUATIONS ^ 
AND ffol) 

It is sufficient to show that 



((2 - 2F) VA + v/6(A(2 - 2F) + 6<J)) 



((2F + 1)VA(2-2F) 



■6(5 



(2 - 2F)v/A(2F+ 1) + 66) > 



(CI) 



if and only if < S x and F > F^. By putting <5' := x> 
the LHS of IjClfl is evaluated as 

LHS of gig 

(2 - 2.F) + 3y/(2-2F) + 5') 



{2F + i)y/(2-2F)+6' - (2 - 2F) y / {2F + 1) + 5' 



(2 - 2F) + (2 - 2F)t/(2-2F) + <P 



{2-2F)y/(2F+l) + S' 



--(2-2F) + ^(2 - 2F) + 6>- ^ (2F + 1) + S^j 



Since 0<F<land5>0 
/3 



+ y/(2 - 2F) + 5' - v/(2F + 1) + 5' > 



if and only if 8\ > S' and i 7, > Fg/ 



APPENDIX D: PROOF OF (gHJ 



Define by 
_min D(Poi(0,...,0,Mi,0,...,0)||Poi(//)) = R- 

W-fl' — CQ 

In fact, when wta < cq, 

min D(Poi(0, . . . , 0, a, 0, . . . , 0) || Poi(/2')) 

/i^>0: w-fl'—CQ 



fij-a + a log — 



u'->0: w-u'—cq — ' 
.3=1 



; min a + fi — a + a log — 

a>0,/3>0: ma+fi M 0=co a 
f£!l_ a + a log^ if a > £o(!£M^£i) 

- J Wi ° CO — WMWi 

} ^ + a l 0g mM^ if a < 



it' A/ 
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This value is monotone decreasing concerning a. When 

= c (w M - m ) m j . i _ c^"-™-) log 

Hence, the value Mi coincides with the the value Mi de- 
fined by (|5FJ|) and (g^ . 

Thus, the relation (|38f) follows from the relation 

jiiin Z)(Poi(piai, . . . ,p m a m )||Poi(/2')) 

W-j.L' — Cq 
771 

<V Pl min D(Poi(0,...,0,a i ,0,---,0)||Poi(//)). 



We choose /!■ such that 

£(Poi(0,...,0,a l ,0,...,0)||Poi(/2')) 
D(Poi(0, . . . , 0, Oi, 0, . . . , 0)||Poi(/Z<)). Then, the above 
inequality follows from Lemma [S] in the following way: 

m 

Vp, ^min D(Poi(0,... ! 0,ai,0,...,0)||Poi(M')) 

' ■* w-£t'=co 
i—1 

m 

= PiD(Poi(0, . . . , 0, a l7 0, . . . , 0)||Poi(#)) 

m 

>D(Poi(piai, . . . ,p ro a m )||Poi(2^pi/^)) 

i=l 

> min D(Poi(piai, . . . ,p m a m )||Poi(/2')). 



The convexity of — log implies that 



iog( — + ^—iMl ) 



log( 



+ (i - P )^) 

pi/i jUj , (1-pK Mi- 



(p^i + (i - pK) ^ (pi/j + (i - p)w •) ^ 



< fog(^) 



^-PM fog(^) 

(p^ + (l-pK) gK v> } 



= Hence, 



p(^log^) + (l-p)£>>g^) 
^ i=i Pi 



i=l 



: £(p^ + (i_pX) 



^ log(^) 



0-M 



>-j2(pui+(i- P x)iog 



(pvi + (1 - pK) 

(p/ii + (i - pK) ' 



Lemma 5 Any rea/ number < p < 1 and any /ottr 
sequence of positive numbers (p-i), {vi), (mO; and (v^) 
satisfy 



i = l 



Mi ' 



111 111 J 

(i-p)(^(m--^) + E^ 1o s3- 



1=1 



M, 



> Eftw* + f 1 - p)*® - m + c 1 - 

«=1 

+ f:^ + (i-p),mo g ^tn"11v 

(PMi + I 1 -PiMi) 



Proof: It is sufficient to show 



APPENDIX E: PROOF OF (gHJ) 



Considering the shape of the graph 



a, we can 



show that the minimum value min^ , y )^c can De a t- 
tained by the boundary of C. Hence the boundary of 
the convex set Co{R) is included by the union Ui-^jhj 
of the lines = {(tXi(R) + (1 - t)xj(R),tyi(R) '+ 
(1 - t)yj(R))\6 < t < 1}. Taking the derivative of 



y/t yi (R) + (l-t) yj (R) 



concerning t, we obtain 



p(£ log £) + (1 log 

1=1 i—1 L 

>f:^ +( i-pK)iog r ( ^+i!-^i ) v 

^ (w + (!-p)Mi) 



imu te< (fi) + (i-t) aj (jz) = ^ (E1) 



te[o,i] yJt yi (R) + {l-t) yj (R) 



Hence, we obtain (l4"5)l . 
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